AITopics | train loss

002262941c9edfd472a79298b2ac5e17-Paper-Conference.pdf

Neural Information Processing SystemsApr-24-2026, 07:14:51 GMT

artificial intelligence, machine learning, natural language, (20 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)
Information Technology > Communications > Social Media (0.69)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

6d0bf1265ea9635fb4f9d56f16d7efb2-Supplemental-Conference.pdf

Neural Information Processing SystemsFeb-13-2026, 18:57:56 GMT

Supplementary Materials for "Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models" Appendix A The Algorithm Appendix B Convergence Rates Appendix B.1 Rate of Convergence for Strongly Convex Functions Appendix B.2 Rate of Convergence for Convex Functions Appendix B.3 Rate of Convergence for Functions Satisfying the PL Condition Appendix B.4 Common Lemmas Appendix B.5 The Polyak Step Size is Bounded Appendix C Experimental details Appendix D Plots Completing the Figures in the Main Paper Appendix D.1 Comparison between PoNoS and the state-of-the-art Appendix D.2 A New Resetting Technique Appendix D.3 Time Comparison Appendix D.4 Experiments on Convex Losses Appendix D.5 Experiments on Transformers Appendix E Additional Plots Appendix E.1 Study on the Choice of c: Theory (0.5) vs Practice (0.1) Appendix E.2 Study on the Line Search Choice: V arious Nonmonotone Adaptations Appendix E.3 Zoom in on the Amount of Backtracks Appendix E.4 Study on the Choice of η In this section, we give the details of our proposed algorithm PoNoS. Training machine learning models (e.g., neural networks) entails solving the following finite sum problem: min Before that, we establish the following auxiliary result. The following Lemma shows the importance of the interpolation property. Lemma 4. W e assume interpolation and that f Let us now analyze case 2). Let us now show that b < 1. B.2 Rate of Convergence for Convex Functions In this subsection, we prove a O ( The above bound will be now proven also for case 2).

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

b433da1b32b5ca96c0ba7fcb9edba97d-Paper.pdf

Neural Information Processing SystemsFeb-10-2026, 19:42:13 GMT

neural network, weight decay, weight norm, (14 more...)

Neural Information Processing Systems

Country:

Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
Asia > Russia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

6d0bf1265ea9635fb4f9d56f16d7efb2-Supplemental-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 20:59:14 GMT

Supplementary Materials for "Don't be so Monotone: Relaxing Stochastic Line Search in Over-Parameterized Models" Appendix A The Algorithm Appendix B Convergence Rates Appendix B.1 Rate of Convergence for Strongly Convex Functions Appendix B.2 Rate of Convergence for Convex Functions Appendix B.3 Rate of Convergence for Functions Satisfying the PL Condition Appendix B.4 Common Lemmas Appendix B.5 The Polyak Step Size is Bounded Appendix C Experimental details Appendix D Plots Completing the Figures in the Main Paper Appendix D.1 Comparison between PoNoS and the state-of-the-art Appendix D.2 A New Resetting Technique Appendix D.3 Time Comparison Appendix D.4 Experiments on Convex Losses Appendix D.5 Experiments on Transformers Appendix E Additional Plots Appendix E.1 Study on the Choice of c: Theory (0.5) vs Practice (0.1) Appendix E.2 Study on the Line Search Choice: V arious Nonmonotone Adaptations Appendix E.3 Zoom in on the Amount of Backtracks Appendix E.4 Study on the Choice of η In this section, we give the details of our proposed algorithm PoNoS. Training machine learning models (e.g., neural networks) entails solving the following finite sum problem: min Before that, we establish the following auxiliary result. The following Lemma shows the importance of the interpolation property. Lemma 4. W e assume interpolation and that f Let us now analyze case 2). Let us now show that b < 1. B.2 Rate of Convergence for Convex Functions In this subsection, we prove a O ( The above bound will be now proven also for case 2).

artificial intelligence, deep learning, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

6d0bf1265ea9635fb4f9d56f16d7efb2-Paper-Conference.pdf

Neural Information Processing SystemsOct-8-2025, 20:59:10 GMT

artificial intelligence, machine learning, optimization problem, (18 more...)

Neural Information Processing Systems

Country:

Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)
North America > Canada > British Columbia (0.04)
Europe > Russia (0.04)
(2 more...)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.95)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

36ad8b5f42db492827016448975cc22d-AuthorFeedback.pdf

Neural Information Processing SystemsOct-2-2025, 12:48:12 GMT

accuracy, mixup model, train loss, (11 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.50)

Add feedback

A Proofs

Neural Information Processing SystemsAug-17-2025, 05:24:30 GMT

A.1 Nonconvex stochastic optimization We give proofs of the theorems in section 3. We first give some lemmas. Following the proofs in [60], we introduce the definition of a supermartingale. Since r (0 .5, 1), it follows that the number of iterations N needed is at most O ( null To prove Theorem 5, we first prove the following lemma. Suppose that Assumptions 1 and 2 hold. When neither regularization nor damping is used, i.e.

adasam, artificial intelligence, machine learning, (16 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.46)

Technology: